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1. INTRODUCTION 

Recognition of hand written characters from document images is an open research problem in the 
area of optical character recognition (OCR). Realization of higher accuracy is quite demanding especially for 
hand written documents. In this connection recognition of South Indian languages is challenging compared to 
North Indian hand written documents. In particular, a research exploration towards Kannada handwritten 
characters is highly difficult due to its large number of character classes and complex geometrical topography [1]. 
Kannada character sets for recognition is typically composed of 13 vowels and 25 consonants as shown in 
Figure 1. 

In addition to the basic Kannada character sets, vowel conjuncts of 13 and consonants conjuncts of 
25 are present as shown in Figure 2. The complexity of character recognition is dependent on the character 
set and its number of classes. In Kannada number of character classes will be 13*25*25 [2]. Increasing in 
number of classes aggravates the challenges in feature analysis and classification process. Additionally, 
challenges inherent in hand written datasets introduce barriers to attain higher recognition accuracy. 

Ample research investigations were reported earlier for classification of hand written Kannada 
characters. The techniques of character classification are based on machine learning and deep learning 
procedures [3], [4]. Although machine learning based techniques perform better towards classification of 
Kannada character classes, the results are appreciable only for limited number of classes. Thus, the research 
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efforts towards classification of large number of character sets are accomplished through deep learning 
procedures. OCR technologies are generally developed addressing the script based recognition needs. OCR’s 
that perform better towards one script results in poor performance towards other scripts. The performance of 
OCR also depends on printed/hand written datasets, number of writing styles, script type and document 
layout. Some of the OCR systems are template based that follow strict rules during input image acquisition 
stage. Several recent studies in [5]-[10] proposed hand written character classification using deep learning 
based models. 


oe a Q os w n W 
2a Ə & w to zB BO ess 
5 D n P B 
23 6 re) Oo V 
gB G 6) [A] £9 
3 Ẹ (A (A D) 
oOo P W 3 DS) 
OD 3s Oo z3 3 a J æ S 
Figure 1. Kannada character set 
e5 es Q Bs 5 3 n z 3B 23 
e3 3 | Bel] se 3 
w | ow | wb | a gel gelalal alig 
a 9 Qo é, B/S] a) a] a es 
am | a] g & o 3 
& 3 a 3 æ 2 e9 


Figure 2. Kannada character set vowels and consonants conjuncts 


In the literature there exists more number of research attempts in the area of Kannada hand written 
character recognition. Some of the significant research include Joe et al. [11] proposed offline character 
recognition system for hand written characters using convolutional neural network. It is proved that hand crafted 
features are not required for hand written character classification. In an another work, Ramesh et al. [12] also 
employed convolutional neural network for handwritten Kannada character recognition. An accuracy of 
93.2% is obtained towards classification of basic vowels and consonants. Further Sandhya et al. [13] 
investigated a framework for degraded Kannada character recognition by addressing character stroke 
breakages, complex compound characters in printed documents using machine learning based techniques. In 
an another work Ucar et al. [14] used capsule network for classification of Kannada handwritten digit 
datasets. A comparison is performed using convolutional neural networks and observed that Capsule 
Network model provide better efficiency. 

Mahapatra et al. [15] applied learning viz auto encoders to classify handwritten Kannada characters 
using KNN and SVM classifiers. Later, capsule networks for recognition of low resource languages such as 
Kannada is investigated by Abeysinghe and Perera [16]. The applicability of the model is extended to 
languages such as oriya and Sinhala character sets. In a different work Ramesh and Kumar [17] applied 
convolutional neural network for recognition of Kannada handwritten words, SVM classifier is employed as 
part of fully connected layer. Mamatha [18] proposed a dataset for handwritten Kannada vowels and classified 
the local binary pattern, run length count and chain codes using K-means clustering. Mahapathra et al. [19] 
proposed a generator based methods for offline handritten character recognition using convolutional auto 
encoder with GAN architecture. Indira and Selvi [20] reviewed various methods for Kannada printed 
character recognition. In another work applicability of wavelet transform and structural features are adapted 
for Kannada handwritten character recognition by Pasha and Padma [21]. The experimentations were 
conducted on hand written numerals. 
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Recognition of Arabic characters is proposed by Ahamad et al. [22] using convolutional neural 
networks trained using different learning rates. In a subsequent work, Boufenar et al. [23] classified 
handwritten Arabic characters using convolutional networks through transfer learning approach. Followed by 
this Vaidya et al. [24] carried out handwritten character classification using deep learning convolutional 
model. Experimentations are conducted on handwritten digits. A character recognition model is proposed by 
Ram et al. [25] using deep convolutional neural network. In another work Wick et al. [26] applied deep 
learning packages for optical character recognition. A series of attempts by Zhao et al. [27], Selmi et al. [28], 
Iamsa et al. [29], Nair et al. [30], Alif et al. [31] used variety of deep learning architectures for various 
handwritten character recognition problems. 

From the literature, it is evident that deep learning systems are widely used for recognition of 
handwritten characters. However the focus towards classification of South Indian scripts is rarely noticed. 
Though the deep learning architectures proposed for character and digit classification are successful, the 
realization of high accuracies with respect to handwritten character recognition is a failure. In a couple of 
works satisfactory progress is achieved towards North Indian scripts based on Devanagari. Thus, there exists a 
scope for applying deep learning based models for handwritten Kannada character recognition. In the proposed 
work, capsule network based deep learning model is applied to classify Kannada handwritten characters. 


2. PROPOSED METHOD 

Representation of geometrical relationships via artificial neural networks is the inclination of 
capsule networks (CN). It is a type of convolutional neural network that provides a way of reusing the output 
produced in a specific iteration to stabilize the outputs to be obtained in subsequent iterations. CN’s possess 
strange abilities in interpretation of feature maps to various images, this will help in various image 
classification and recognition tasks. The features employed for learning are present at the core of convolution 
layers in CN’s. With regard to handwritten character recognition problem, learning from the level of pixels 
with respect to edges and color suffices the image recognition tasks. Unlike convolutional neural network 
(CNN), the CN’s are not adaptable to the change in small information in the parts of image. Therefore in the 
proposed method an attempt is made towards classification of handwritten Kannada characters using CN’s. 
Figure 3 depicts the architecture of capsule network to perform the classification of handwritten Kannada 
characters. Architecture of CN is comprised of an input layer followed by two levels of convolution layers, 
two levels of capsule layers preceded by squash and squeeze functions and a multilevel upsampling followed 
by an output layer. 
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Figure 3. Architecture diagram of capsule network 
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Input layer is built of m*n number of neurons for an image of dimension m*n after which 
convolution layers C; and C2. C; built of 256 convolution kernels of dimensions 9*9 and C2 further down 
samples input from Cı. Convolution layer C2 is composed of 64 convolution kernels each of dimension 6*6. 
Subsequently the down sampled outcomes produced from convolution layer C2 are subject to capsule layer in 
two levels. In CNN each node in the input layer represents a pixel, input data from image is transformed to 
the subsequent layers for down sampling. In the down sampling process their exist a loss of positional 
attributes that results in loss of location of object structures and direction and its gradient directions. 
Therefore, capsule layers are introduced in the proposed architecture that can retain more information about 
object structures in terms of its position, size and orientation. Thus the information prevalent to perform 
Kannada hand written character classification can be stored in the form of capsules. Capsules are vectors that 
emphasize the positional attributes of objects along with spatial invariance information. Each capsule carries 
a degree of features across a range of spatial variations adequate to handle the handwritten Kannada character 
recognition task. Hence the information in a particular capsule helps in better identification of a character. In 
proposed architecture, a capsule vector of length 16 is employed in the layer 1 with each capsule size of 5 
and number of filters as 16. 

In capsule layer 2, capsule vector length of 32 with number of capsules 49 and a routing step size of 
one is adapted. The length of capsule vector represents the probability of existence of an object with its 
orientation details and it also acts as one of the instantiation parameters. Activation of capsules at layer 1 
assists in making predictions and also instantiation of parameters in the subsequent capsule layers. In level 
one, 16 nodes are used in each capsule that represents 16 different feature dimensions for one character. The 
output of a capsule layer one is a vector comprising of properties such as width, scale, stroke thickness and 
square of a particular character. The elements of the capsule vector are delimited between the intervals of 0-1 
indicating the probability of affinity to each class. Whenever multiple predictions from two capsule layers 
commensurate that results in activation of a neurons in capsules in the higher levels. Additionally these 
activated neurons from the capsule are subject to squash and squeeze functions as follows. Squash and 
squeeze are activation functions which are applied over the outputs produced by the capsule layers [32]. 
These functions are mainly useful in inferring nonlinear relationships within the data and typically act as 
activation layers by returning a vector of elements that falls in the range of O and 1. Squash is a special 
activation function that carries out the normalization on the scaled output values produced by capsule layers. 
If Vec; represents the output vector returned by squash function, j represents a capsule in the layer l, 
S; represents a vector of scalars produced by capsule j in layer l then activation obtained through squash 
function is given by (1). 


2 
Vee = \|sca,| Sj 


i = Taea; Ts (i) 
Thus Vec; is a normalized vector returned by applying squash function on sca;. Further Vec; represents the 
route through which data from various capsules to be trained subsequently. 

Figure 4 gives the illustration of data being routed after applying squash function. The outputs 
obtained from routing capsule layers are further subject to excitation using squeeze function. Channel wise 
feature map vectors are recalibrated using squeeze by explicitly modeling inter-dependencies between the 
features. This will increase the accuracy via better representation of channel wise features for classification. 
Following the squeeze excitation there are repeated convolutions at level 1, level 2, and level 3 through 
convolution layer 3, convolution layer 4 and convolution layer 5. Additionally a softmax activation function 
is applied on the up sampled output vectors produced from up convolution process. Softmax function helps in 
mapping of the output vectors to desired number of classes. 


Primary capsule layer § Routing capsule layer 


Convolution 
layer C1 and 


C2 


Figure 4. Routing of data-capsule layers using SQUASH 
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3. EXPERIMENTATION 

In the proposed work, performance of CN’s is tested with 7,769 training datasets, 245 testing 
datasets and 490 samples for validation. A training and validation datasets are the samples of datasets that are 
held back for learning and unbiased performance evaluation. Test datasets are collected separately from 
about 200 users with references of 1-2 samples for each of the 49 classes resulting into 245 samples. To 
estimate the performance of trained model, validation is conducted with 490 samples for 49 classes with 
10samples in each class. Validation data is mainly used to understand the accuracy estimate of the CN. Table 1 
shows the details of datasets used for experimentation for all the 49 classes. 


Table 1. Dataset details of Kannada handwritten character recognition 


No of Samples No of Samples 
Sl. Character Train Test Validation Class Sl. Character Train Test Validation Class 
No Label No Label 
1 es 218 5 10 1 26 é 408 5 10 26 
2 es 201 5 10 2 27 3 213 5 10 27 
3 ia) 269 5 10 3 28 Bb 100 5 10 28 
4 fo") 470 5 10 4 29 re) 75 5 10 29 
5 iN) 205 5 10 5 30 20 10 5 10 30 
6 O 203 5 10 6 31 3 10 5 10 31 
7 a) 204 5 10 7 32 [A] 10 5 10 32 
8 F 221 5 10 8 33 RI 133 5 10 33 
9 D 416 5 10 9 34 Q 82 5 10 34 
10 60 241 5 10 10 35 o 15 5 10 35 
11 an 206 5 10 11 36 DB 10 5 10 36 
12 & 225 5 10 12 37 P 10 5 10 37 
13 PR] 248 5 10 13 38 W 192 5 10 38 
14 e30 204 5 10 14 39 3 173 5 10 39 
15 e538 112 5 10 15 40 a 150 5 10 40 
16 oa 215 5 10 16 41 er) 148 5 10 41 
17 D 228 5 10 17 42 3 100 5 10 42 
18 n 202 5 10 18 43 © 44 5 10 43 
19 P 132 5 10 19 44 oo 77 5 10 44 
20 B 238 5 10 20 45 3 110 5 10 45 
21 oF 205 5 10 21 46 a 10 5] 10 46 
22 A 110 5 10 22 47 a 20 5 10 47 
23 2% 351 5 10 23 48 ~ 125 5 10 48 
24 lor) 10 5 10 24 49 g 200 5 10 49 
25 op 10 5 10 25 


A mini batch gradient descent approach with batch size of 50 and number of training steps equal to 
7000 is considered for evaluation. Learning frameworks used includes TensorFlow, Keras 2.1.5 and OpenCV 
4.5.1 and the primary development environment used is Spyder. The computing resources employed for 
simulation includes ASUS laptop with 16 GB RAM, Intel core I7 processor and an additional GPU memory 
of 4 GB. Each image sample considered for experimentation is a gray scale image with 28*28 pixels. The 
model is tested with different hyper parameters by modifying the batch sizes from 10-50. Samples of 
training, validation and test datasets are randomly chosen and few samples are shown in Figure 5. 


Training Validation Testing 


Figure 5. Sample instances from training validation and testing 
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The margin loss obtained after instantiation of vectors from routing capsules with respect to the 
trained data is shown in the Figure 6. About 10-12 hours of time is consumed for training the CN with 7,769 
samples of 49 classes and routing step size of 0.01. Reconstruction loss obtained for 7000 training steps after 
routing process via primary and routing capsules is shown in Figure 7. 


0.6 2 0.1 
a 05 S 0.095 
£04 & 0.09 
c © 0.085 
2 ee E 0.08 
=" S 0.075 
0.1 & 0.07 
0 0.065 | 
0 1k 2k 3k 4k 5k 6k 7k 0 dk 2k 3k 4k 5k 6k 7k 
Number of trainina steps Numberof training steps 
Figure 6. Margin loss-training and Figure 7. A reconstruction loss-training and 
validation samples validation samples 


It is observed that a loss of 0.66 is found at the end of 7000 training steps with the validation dataset 
samples and total loss for all the 49 character classes from the capsule network is found to be less than 3%. 
If ‘M’ represents reconstruction loss, ‘T’ as total loss and ‘f’ represents as scaling factor than total loss ‘T’ is 
given by (2). Figure 8 shows the total loss obtained during classification. 


T=M+B(R) (2) 


The overall accuracy obtained is found to be 97% with average accuracy of 95% across all classes. 
Figure 9 depicts the accuracy obtained towards classification of hand written Kannada characters. Table 2 
shows the performance metrics in terms of precision, recall and accuracy with respect to every class. From the 
Table 2, it is evident that 48 out of 49 classes are classified with an average accuracy of more than 91.5%. i.e. 
4 out of 5 samples are classified on an average for 18 classes and 5 out of 5 samples are correctly classified 
for all the remaining classes. Experimentation on the proposed datasets are also conducted using Inception 
V3 network and accuracy of the handwritten Kannada character recognition is found to be 95%. Cross 
entropy is also evaluated to depict the distance between the predicted and actual class labels. Cross entropy 
depicts the loss occurred towards Kannada handwritten character recognition. Figures 10 and 11 depicts the 
performance of inception V3 network in terms of accuracy v/s number of iterations and cross entropy. 

It is evident from the experimentation that performance of CN architecture is much better compared 
to inception V3 network. Also the loss occurred with regard to CN is much less compared to inception V3. 
Moreover, CN helps in reduced loss of features from the image during the convolution process. Thus, 
robustness of CN is much higher than the inception V3 network. 


0.6 \, r 
0.5 \ 0.9 


3 04 g 
° ‘ © 0.7 
= 5 06 
F 0.3 Š 0.5 
= 02 0.3 
0.1 0.2 
0.1 
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0 1k 2k 3k 4k 5k 6k 7k 0 1k 2k 3k 4k 5k 6k 7k 
No. of training steps 
Figure 8. Total loss training and validation samples Figure 9. Accuracy-training vs validation samples 
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Table 2. Performance metrics of capsule networks with test datasets 
Class label Precision Recall Fl-Score Classlabel Precision Recall  F1-Score 


1 1.00 1.00 1.00 26 1.00 1.00 1.00 
2 1.00 1.00 1.00 2T 0.95 1.00 0.97 
3 1.00 1.00 1.00 28 0.77 1.00 0.87 
4 1.00 1.00 1.00 29 1.00 1.00 1.00 
5 1.00 1.00 1.00 30 0.90 1.00 0.95 
6 1.00 1.00 1.00 31 1.00 1.00 1.00 
7 1.00 1.00 1.00 32 0.96 1.00 0.98 
8 1.00 1.00 1.00 33 0.92 1.00 0.96 
9 1.00 1.00 1.00 34 0.83 1.00 0.91 
10 0.99 1.00 1.00 35 1.00 1.00 1.00 
11 1.00 1.00 1.00 36 0.80 0.80 0.80 
12 1.00 1.00 1.00 37 1.00 1.00 1.00 
13 1.00 1.00 1.00 38 1.00 0.80 0.89 
14 1.00 1.00 1.00 39 1.00 0.80 0.89 
15 0.93 1.00 0.97 40 0.00 0.00 0.00 
16 1.00 1.00 1.00 41 1.00 1.00 1.00 
17 0.91 1.00 0.95 42 1.00 1.00 1.00 
18 1.00 1.00 1.00 43 0.75 0.60 0.67 
19 1.00 1.00 1.00 44 0.86 1.00 0.92 
20 1.00 1.00 1.00 45 0.94 1.00 0.97 
21 1.00 1.00 1.00 46 1.00 0.80 0.89 
22 1.00 0.89 0.94 47 1.00 1.00 1.00 
23 1.00 1.00 1.00 48 1.00 1.00 1.00 
24 1.00 1.00 1.00 49 1.00 1.00 1.00 
25 1.00 1.00 1.00 
Overall Accuracy 0.97 
Macro Average 0.95 
Weighted Average 0.97 
accuracy_1 cross_entropy_1 
25 
0.95 - 
2 
0.85 1.5- 
4 
BI 0.5 
0.65 2 | 
0 500 1k 15k 2k 25k 3k 35k ak O 500 1k 1.5k 2k 25k 3k 3.5k 4k 
Figure 10. Accuracy of the inception V3 network Figure 11. Cross entropy of inception V3 network 


4. CONCLUSION 

Deep learning networks are used extensively in the recent days to deal with variety of pattern 
recognition problems. In this work, we have investigated the applicability of CN towards classification of 
handwritten Kannada characters. A design of CN architecture is proposed with tunable hyper parameters. 
Comparison of proposed model is carried out with inception V3 model and proved that an accuracy of 99% is 
achieved with CN model. In future, the model can be extended to classify simple compound and multi 
compound characters with increased number of data samples. 
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